The other day, I came a small problem: I was investigating a dataset, and the different variables clearly showed a non-linear behaviour.
Consequently, I could not apply the classical linear regression.
An approach to solve this kind of problem is LOESS regression, which stands for locally weighted scatterplot smoothing .

In the following, we will work through an artificial example to make the concept clear and show how to apply in R.

We first create a dataframe with two variables, a and y, which are non-linearily related. The plot shows this relationship.

The scatterplot clearly shows the better fit from the loess regression.
The summary of the regression shows one difference between linear and loess: The loess regression does not give the parameters, since it is a local regression. The classical measure of goodness of fit is r2.
The linear model has a r2 of


0.865

. For the LOESS we have to calculate the r2 and follow this proposal.

The r ² from the loess is


0.953

and thus very good and better than the r ² from the linear regression.
Also the investigation of the plot of residuals vs fitted/predicted values indicates a much better fit of the LOSS regression compared to the linear regression (the residuals plot of the linear regression shows the structure – which we clearly do not want); the QQ-plot shows some issues on the tails.

But what to do now with it? Finally, we probably want to predict some values out of this exercise.
The predict -function helps us with this exercise.

## @knitr create_data
y <- seq(from=1, to=10, length.out=100)
a <- y^3 +y^2  + rnorm(100,mean=0, sd=30)
data <- data.frame(a=a, y=y)
plot(y=y, x=a)

## @knitr linreg
linreg <- lm(y~a)

summary(linreg)
loess <- loess(y~a)
summary(loess)
scatter.smooth(data)
abline(linreg, col="blue")

## @knitr loess_fit
hat <- predict(loess)
plot(y~a)
lines(a[order(a)], hat[order(hat)], col="red")
(r_sq_loess <- cor(y, hat)^2)

## @knitr fit_loess
par(mfrow=c(2,2))
plot(linreg$fitted, linreg$residuals, main="classical linear regression")
plot(loess$fitted, loess$residuals, main="LOESS")
# normal probablility plot
qqnorm(linreg$residuals, ylim=c(-2,2)) 
qqline(linreg$residuals)
qqnorm(loess$residuals, ylim=c(-2,2)) 
qqline(loess$residuals)

## @knitr predict

predict <- data.frame(a=c(10,400,900))
scatter.smooth(data)
abline(linreg, col="blue")
predict$linreg <- predict(linreg, predict)
predict$loess <- predict(loess, predict)
predictpoints(x=predict$a, y=predict(linreg, predict), col="blue", pch=18, cex=2)
points(x=predict$a, y=predict(loess, predict), col="red", pch=18, cex=2)

LOESS regression with

Recent Posts